Extracting Common Motifs under the Levenshtein Measure: Theory and Experimentation
نویسندگان
چکیده
Using our techniques for extracting approximate non-tandem repeats[1] on well constructed maximal models, we derive an algorithm to find common motifs of length P that occur in N sequences with at most D differences under the Edit distance metric. We compare the effectiveness of our algorithm with the more involved algorithm of Sagot[17] for Edit distance on some real sequences. Her method has not been implemented before for Edit distance but only for Hamming distance[12, 20]. Our resulting method turns out to be simpler and more efficient theoretically and also in practice for moderately large P and D.
منابع مشابه
Finding Higher Order Motifs under the Levenshtein Measure
We study the problem of finding higher order motifs under the levenshtein measure, otherwise known as the edit distance. In the problem set-up, we are given sequences, each of average length , over a finite alphabet and thresholds and , we are to find composite motifs that contain motifs of length (these motifs occur with atmost differences) in distinct sequences. Two interesting but involved a...
متن کاملExtracting semantic clusters from the alignment of definitions
Through tile alignment of definitions fronl two or more dilTerent sources, it is possible to retrieve pairs of words that can be used indistinguishably in the same sentence without changing tile meaning of the concept. As lexicographic work exploits common defining schemes, such as genus and dilTerentia, a concept is simihu'ly defined by different dictionaries. The dilTerence in words used betw...
متن کاملNorwegian Dialects Examined Perceptually and Acoustically WILBERT HEERINGA
Gooskens (2003) described an experiment which determined linguistic distances between 15 Norwegian dialects as perceived by Norwegian listeners. The results are compared to Levenshtein distances, calculated on the basis of transcriptions (of the words) of the same recordings as used in the perception experiment. The Levenshtein distance is equal to the sum of the weights of the insertions, dele...
متن کاملA Fast Algorithm for the Inexact Characteristic String Problem
We present a new algorithm to solve the INEXACT CHARACTERISTIC STRING PROBLEM using Hamming distance instead of Levenshtein distance as a measure. We embed our new algorithm and the previously known algorithm for Levenshtein distance in a common framework which reveals an additional improvement to the Levenshtein distance algorithm. The INEXACT CHARACTERISTIC STRING PROBLEM can thus be solved i...
متن کاملAn Analytical Study on Calligraphic, Human and Vegetal Motifs in Some Examples of Enameled Glasses in Egypt and Syria (Mamluke Period) in Comparison with Iranian Metalwork (Ilkhanid and Timurid Periods)
Throughout history, artworks in the field of metalwork and glasswork reflect different themes. They are considered as important means of manifesting Islamic art and traditional crafts in different countries which have been producing a wide variety of art products. Meanwhile, the influence of some kinds of artworks from different lands and the counterinfluence of concepts and artistic themes amo...
متن کامل